COPASProfiler

COPASProfiler Package Description

COPASProfiler is an R package that can view, filter, classify and analyze COPAS Large Particle Flow Cytometer. Below is a tutorial on how to install the package, get training data and how to use the functions and pipelines.

COPASProfiler Model Creation pipeline

Introduction

The COPAS Large Particle Flow Cytometer (COPAS FC) allows fluorescence screening of a large number of worms in a short time. However, the output of machine is hard to interpret. Worms that go in the COPAS FC can go in straight or turned over themselves. If the worm is turned on itself, that will overlap the florescence expression of its body and will create a different expression pattern in the data compared to a worm that went in straight. The COPASProfiler package was created to help visualize the objects better. This tutorial is intended to teach users how to create their own SVM model to distinguish good worms (Worms that went in straight) from other unwanted objects.

How to get the package

You can access the package’s GitHub through this link: https://GitHub.com/ksnksa/COPASProfiler You can also run this code to directly download the package in Rstudio.

devtools::install_github("ksnksa/COPASProfiler/COPASProfiler")

To load the package, simply run the following code.

library("COPASProfiler")

What data set to use

The COPAS FC outputs more than one file for each run. (NameOfRun).txt which contains the general information of all the objects and four (NameOfRun)_ch#_prf.txt files (the # goes from 0 to 3) which contain the data from each channel. We’ll only be using the last four files. If you just want to follow this example, the tutorial loads the example files directly from our GitHub repository.

How to get the annotated IDs

We also use another csv file that contains annotated objects and their IDs. We’ll be using this file to create our model. To create your own annotation, you can go to our website. The website allows you to visualize the object’s channel data.

1: Load the correct channel data file/s then press submit. 2: The output plot of the selected object. 3: Here you can select which object ID you’d like to view. 4: You can check which channel you’d like to be in the plot. 5: The buttons allow you to annotate the objects. Desired Shape will annotate this object as a good worm, and Undesired Shape this object as a bad worm. Note: the annotation label does not matter, therefore if the user would annotate profiles based on specific fluorescence expression the same principle applies. After annotating any number of objects, a button will pop up to download the annotated IDs (Next to the number 6).

Running the pipeline

##Loading the libraries
library(ggplot2) 
library(dplyr) 
library(scales) 
library(COPASProfiler) 
library(kernlab)
library(jmotif) 
library(pracma)
library(xfun)

Loading the data

In this example, we used a sample data set generated by the Laboratory of Synthetic Genome Biology. You can run the following code to use the same data set and annotated IDs files we’re using. Ch0D, Ch1D, Ch2D, Ch3D, are the paths for each channel profile. SummaryDataPath, is the path for the summary file generated by the COPAS, which contains the overall summary of the profile measurements. GoodIDD is the path to the annotated IDs csv file.

#Channel directory taken from the COPASProfiler GitHub repository
#When using your own data simply change these links to the file directory of the channels. 
Ch0D <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/1123.2C%20red%20500%20green%20600%20gain%202_ch0_prf.txt'
Ch1D <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/1123.2C%20red%20500%20green%20600%20gain%202_ch1_prf.txt'
Ch2D <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/1123.2C%20red%20500%20green%20600%20gain%202_ch2_prf.txt'
Ch3D <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/1123.2C%20red%20500%20green%20600%20gain%202_ch3_prf.txt'
#Summary data path taken from the github directory
SummaryDataPath <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/1123.2C%20red%20500%20green%20600%20gain%202.txt'
#CSV file containing the levels (or annotation) for each worm in our training set. 
GoodIDD <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/1123.2c.csv'

Setting up the parameters

The following parameters can be changed to suit the user’s needs. MaxAmp, MinLength and MaxLength are parameters that control the initial filtering in this pipeline. MaxAmp is simply the maximum amplitude that an object can have. MinLength and MaxLength determine the minimum and maximum time of flight for each object. The purpose of this filtering, is to remove unwanted bubbles or artifacts. ChannelToCluster determines which channel the pipeline will work with. 1 is for Channel 0, 2 for Channel 1 and so on. Stage determines what worm stage the program will cluster. The package assigns the stage of each worm depending on the time of travel. The thresholds for the time of travel were taken from experimental data generated by the SGB lab. When using your own data, the thresholds might differ depending on your COPAS FC acquisition parameters.

#Setting up the parameters we need for the analysis
# Max amplitude an object can have 
# Anything more will be filtered out
MaxAmp <- 35000
# Minimum and maximum time of flight for each object
MinLength <- 51
MaxLength <- 900
# Which channel the pipeline will work on 
# 1 is for Channel 0, 2 is for Channel 1 
# 3 is for Channel 2, 4 is for Channel 3
ChannelNumber <- 1

Example run

Loading the training set

First we load up the channel data and apply an initial filter. Then we load up the annotated IDs csv file and create a variable that has the data for the worms that were annotated.

# The ReadChannel function simply takes in the channel directories and returns a list with all
# the channels data.
channellist <- ReadChannel(Ch0D,Ch1D,Ch2D,Ch3D)
# The PreProcessData function removes profiles outside of the provided thresholds, removes trailing zeros for each profile
# applies the PAA function to uniform the length, and applies z-score normalization on the Y axis. 
ModData <- PreProcessData(channellist, MinLength,MaxLength,MaxAmp, ChannelNumber,SummaryDataPath)
# The CreateTrainingSetIDs function translates the annotated IDs csv 
# file to the data we have and gives out the index of the worms that are annotated.
WormIDs <- CreateTrainingSetIDs(ModData,GoodIDD)

Option 1: Model performance on the same individual set

To validate the model performance on the same set, one must split the annotated set into two separate sets first. The user can specify what percent of the annotated data set will be the training set. We recommend a 50/50 split for an initial validation test.

# If the SplitPercent variable is 0.5, then half of the annotated set will 
# will be a training set and the other half will be a prediction set. 
SplitPercent <- 0.5
DataSetList <- RandomTrainingSet(ModData, WormIDs, round(length(WormIDs[[1]]) * SplitPercent),round(length(WormIDs[[2]]) * SplitPercent))
# TSet is the training set, and PSet is the prediction set. 
TSet <- DataSetList[[1]]
PSet <- DataSetList[[2]]

Next we will use the training set to make a classification model. The model will then classify the prediction set. The model performance will be calculated by comparing the predicted labels against the real labels of the prediction set. Additionally, the cross parameter in the “ksvm” function will perform k-fold cross validation which will give us some insight on the model performance .

# Creating the model using the ksvm function 
model <- ksvm(as.matrix(TSet[,1:(MinLength-1)]), as.factor(TSet[,MinLength]),type = 'C-svc', kernel= "rbfdot",scaled=FALSE, cross = 5)
# Running the model against the prediction set. 
pred <- predict(model,PSet)
# The positive variable will save the ID names of all profiles that were labeled as "2" or "good" by the model. 
Positive <- rownames(PSet[which(pred==2),])
# To find which of these IDs are actually True Positives (or correct classification)
# We will count how many IDs are in the WormsIDs[[1]] variable which will contain all 
# the annotated IDs of the good profiles.  
TP <- sum((Positive %in% WormIDs[[1]]), na.rm = TRUE)
# we will repeat the same process with the "bad" or "1" profiles. 
Negative <- rownames(PSet[which(pred==1),])
TN <- sum(Negative %in% WormIDs[[2]],na.rm = TRUE)
# The code below will calculate the False positive number and the false negative numbers
FP <- length(Positive) - TP
FN <- length(Negative) - TN
# The code below will print out the confusion matrix results of the Model 
Accuracy <- ((TP + TN) / (length(Positive) + length(Negative))) * 100
Precision <- TP/(TP + FP) * 100
Sensitivity <- TP/(TP + FN) * 100
Specificity <- TN/(TN+FP) * 100
paste("Model Accuracy is: ", round(Accuracy,digits=2), '% ','Precision: ',round(Precision,digits=2),'% ',
      'Sensitivity: ',round(Sensitivity,digits=2),'% ','Specificity: ',round(Specificity,digits=2),'%',sep='') 
## [1] "Model Accuracy is: 93.91% Precision: 89.82% Sensitivity: 95.31% Specificity: 93.01%"

Option 2: Loading a prediction set

The same parameters will be used on the prediction set as previously stated.

#Channel directory taken from the COPASProfiler github
#When using your own data simply change these links to the file directory of the channels. 
Ch0D2 <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202_ch0_prf.txt'
Ch1D2 <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202_ch1_prf.txt'
Ch2D2 <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202_ch2_prf.txt'
Ch3D2 <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202_ch3_prf.txt'
SummaryDataPath2 <- "https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202.txt"
#CSV file containing the levels (or annotation) for each worm in our training set. 
GoodIDD2 <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2(MOSTIGREEN).csv'

Note: If you will use the classification model generated with out code, make sure the minimum length variable does not change. As data set used to create the model will have a uniform length of the MinLength variable - 1. The classification model generated will not work against profiles with a different length.

#Setting up the parameters we need for the analysis
# Max amplitude an object can have 
# Anything more will be filtered out
MaxAmp <- 35000
# Minimum and maximum time of flight for each object
MinLength <- 51
MaxLength <- 900
# Which channel the pipeline will work on 
# 1 is for Channel 0, 2 is for Channel 1 
# 3 is for Channel 2, 4 is for Channel 3
ChannelNumber <- 1
# The ReadChannel function simply takes in the channel directories and returns a list with all
# the channels data.
channellist2 <- ReadChannel(Ch0D2,Ch1D2,Ch2D2,Ch3D2)
# The PreProcessData function removes profiles outside of the provided thresholds, removes trailing zeros for each profile
# applies the PAA function to uniform the length, and applies z-score normalization on the Y axis. 
ModData2 <- PreProcessData(channellist2, MinLength,MaxLength,MaxAmp, ChannelNumber,SummaryDataPath2)
# The CreateTrainingSetIDs function translates the annotated IDs csv 
# file to the data we have and gives out the index of the worms that are annotated.
WormIDs2 <- CreateTrainingSetIDs(ModData2,GoodIDD2)

Model performance on the other prediction set

We have the option to use the previous model we created, or to create a new model using the full annotated data set and validate that model with the new data set. I suggest using the full data set to create a classification model, as they tend to perform better that way.

# We will recreate the classification model using the first data set we loaded, but this
# time we will use the full data set instead of splitting it. 
SplitPercent <- 0.99
DataSetList <- RandomTrainingSet(ModData2, WormIDs2, round(length(WormIDs2[[1]]) * SplitPercent),round(length(WormIDs2[[2]]) * SplitPercent))
# TSet is the training set, and PSet is the prediction set. 
TSet <- DataSetList[[1]]

Next let’s create the new classification model and validate the performance using the other data set we just loaded.

# Creating the model using the ksvm function 
model <- ksvm(as.matrix(TSet[,1:(MinLength-1)]), as.factor(TSet[,MinLength]),type = 'C-svc', kernel= "rbfdot",scaled=FALSE, cross = 5)
# Running the model against the prediction set. 
pred <- predict(model,ModData2)
# The positive variable will save the ID names of all profiles that were labeled as "2" or "good" by the model. 
Positive <- rownames(ModData2[which(pred==2),])
# To find which of these IDs are actually True Positives (or correct classification)
# We will count how many IDs are in the WormsIDs2[[1]] variable which will contain all 
# the annotated IDs of the good profiles.  
TP <- sum((Positive %in% WormIDs2[[1]]), na.rm = TRUE)
# we will repeat the same process with the "bad" or "1" profiles. 
Negative <- rownames(ModData2[which(pred==1),])
TN <- sum(Negative %in% WormIDs2[[2]],na.rm = TRUE)
# The code below will calculate the False positive number and the false negative numbers
FP <- length(Positive) - TP
FN <- length(Negative) - TN
# The code below will print out the confusion matrix results of the Model 
Accuracy <- ((TP + TN) / (length(Positive) + length(Negative))) * 100
Precision <- TP/(TP + FP) * 100
Sensitivity <- TP/(TP + FN) * 100
Specificity <- TN/(TN+FP) * 100
paste("Model Accuracy is: ", round(Accuracy,digits=2), '% ','Precision: ',round(Precision,digits=2),'% ',
      'Sensitivity: ',round(Sensitivity,digits=2),'% ','Specificity: ',round(Specificity,digits=2),'%',sep='') 
## [1] "Model Accuracy is: 95.13% Precision: 92.84% Sensitivity: 96.39% Specificity: 94.15%"

optimizing model param

To optimize the parameters of the model, we will be using the mlr library. So we will have to load it first.

library(mlr)
TSet$Factor <- as.factor(TSet$Factor)
ksvm_task = makeClassifTask(data = TSet, target = "Factor")
discrete_ps = makeParamSet(
    makeDiscreteParam("C", values = c(2 %o% 10^(-5:5)) ),
    makeDiscreteParam("sigma", values = c(2 %o% 10^(-5:5)) )
)
ctrl = makeTuneControlGrid()
rdesc = makeResampleDesc("CV", iters = 3L)
res = tuneParams("classif.ksvm", ksvm_task , rdesc, measures=acc, par.set = discrete_ps, control = ctrl)
print(res)
## Tune result:
## Op. pars: C=200; sigma=0.002
## acc.test.mean=0.9380054

The results will indicate the C and Sigma values you can use to create the model. After changing the C, and Sigma values test the model performance again to determine if the new parameters improve the model. Note: be sure to test the performance against another data set to reduce the possibility of overfitting to the data set used to create the model.

To use the C and Sigma parameters, follow this example code:

model <- ksvm(as.matrix(TSet,1:(MinLength-1)]), as.factor(TSet[,MinLength]),type = 'C-svc',kernel= "vanilladot",C=200,kpar = list(sigma = 0.002),scaled=FALSE, cross = 10)

Different Kernels

You can also change the kernels used to create the model. Here as some of the options: rbfdot, polydot, stringdot, besseldot. Alternatively, you can run this line “?ksvm” and learn more about what kernels are available to use from the documentation provided from the kernlab package.

Saving the Model

If the model performance is to your liking, then you can run the following code to save the model and use it later.

#change the directory to where you would like to save the model. Change the filename but make sure the extension ends with ".R"
save(model,file='/directoy/filename.R')

#to load the model again run the following code
ModelName <- load('/directoy/filename.R')
model <- get(ModelName)

Transgene Plotter Pipeline

Introduction

The purpose of this tutorial is to create an easy to use pipeline to plot the fluorescence of C. elegans. The COPAS worm sorter provides a txt file based output that contains the summary of all the objects that were screened. The user then has to manually copy the information of the desired objects and plot them using third party plotting programs. The functions in the COPASProfiler package allow for plotting of individual runs or strains or creating a summary plot containing different runs or strains. The functions can also use a classification model to remove unwanted profiles or by using your own annotation to remove specific profiles. You can create your own annotated IDs file by visiting our website.

Libraries

## Loading the required libraries 
library(ggplot2)
library(dplyr)
library(reshape2)
library(COPASProfiler)
library(plotly)
library(utils)
library(e1071)
library(prospectr)
library(reshape)
library(tibble)
library(scales)
library(stringr)
library(pracma)
library(jmotif)
library(kernlab)

Running classification (Optional)

If you have a classification model you would like to apply to the data, you can use the code below to run the classification on the data and create an ID file that can be used to remove unwanted profiles. The data set that will be classified has to be either the the “full file” text output or the first channel. In this example, we will use the first channel text file.

ModelDirectory <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/Model1123.R'
DataDirectory <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202_ch0_prf.txt'
#Minimum time of flight
MinTOF <- 51
#Maximum time of flight
MaxTOF <- 800
#Maximum optical density amplitude 
MaxPeak <- 35000
# 'Fullfile' means the input data is the unsplit COPAS files, or 'FirstChannel'
DataType <- 'FirstChannel'

Only run the code below if using the example model from the GitHub link, if using a model from a local directory do not run the following

ModelDirectory <- url(ModelDirectory)

Running classification.

AnnotatedIDs <- RunClassification(DataDirectory,ModelDirectory,MaxPeak,MinTOF,MaxTOF,TypeOfData = DataType)

Plotting One Strain/Run

The SummaryPlot function takes in one run summary file (in .txt format), the name of the strain/run and the desired fluorescence channel to return six plots. The function groups up the objects into five groups based on their time of flight and plots them individually. The final plot is the summary plot containing all five groups.

Parameters

The data used here was generated by the Laboratory of Synthetic Genome Biology. You can access these data and download them at this github. The minimum parameters needed for the SummaryPlot function are FileDirectory, NameofStrain and Channel. Note: The input for these parameters must be in between ’ ’.

# File directory (the summary txt file output from the COPAS worm sorter) or the Fullfile 
FileDirectory <-'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202.txt'
# Name of the strain we're working with 
NameofStrain <- '"Data Set Name"'
#Which channel to plot, G is for green, Y is for yellow and R is for red 
Channel <- 'G'
#Measure: 'I' to plot the integral, 'H' to plot the maximum peak
MeasureType <- 'H'
#Scale: 'Normal' is the default. 'Log2' and 'Log10' change the scaling of the Y axis to logarithmic. 
ScaleType <- 'Normal'
#DataType: Default is 'Summary' which takes in the summary output file from the COPAS (usually ends in _prf.txt)
#'Fullfile' means the input data is the unsplit COPAS files, "Summary" is the summary file containing the summary of the profiles but not individual profiles. 
DataType <- 'Summary'
#Maximum Fluorescence accepted. Any more than provided number will be filtered out. 
FluoMax <- 50000 #Input 'NA' to skip this step

Running The Function

P <- SummaryPlot(FileDirectory,NameofStrain,Channel,Measure = MeasureType,
                 Scale = ScaleType, TypeOfData = DataType, FluoThreshold = FluoMax)
P[[1]] 
P[[2]]
P[[3]]
P[[4]]
P[[5]]
P[[6]]
P[[7]]
##              Min 1st Qu Median      Mean  3rd Qu   Max Count
## TOF: 50-75    75 224.25  298.0  413.1328  451.75  1723   128
## TOF: 75-150  136 275.00  425.5  688.9000 1005.75  6388   210
## TOF: 150-225 193 447.00  572.0  654.0727  784.50  1503    55
## TOF: 225-500 279 435.00  562.0  923.6705  912.25 21693   440
## TOF: 500-800 323 572.00  662.0 1138.5679  936.00 25498   287
## All TOFs      75 404.00  583.5  863.1321  888.00 25498  1120

Remove Specific Worms From The Data Set

The function can also take in another parameter (AnnotatedIDs) to remove specific objects from the data set before plotting. Using the COPASWormTools, we can download the annotated IDs file and provide the directory to the function. The function will then remove the annotated ‘bad’ worms before plotting the strain/run.

# If using the classification model to remove specific worms, do not run the code. 
# Run this code if you will use a manual annotation to remove specific profiles
AnnotatedIDs <- "Data Directory here"
P <- SummaryPlot(FileDirectory,NameofStrain,Channel,Measure = MeasureType,
                 Scale = ScaleType,WormIDs = AnnotatedIDs, TypeOfData = DataType, FluoThreshold = FluoMax)
P[[1]] 
P[[2]]
P[[3]]
P[[4]]
P[[5]]
P[[6]]
P[[7]]
##              Min 1st Qu Median     Mean 3rd Qu  Max Count
## TOF: 50-75    NA     NA     NA      NaN     NA   NA     0
## TOF: 75-150   NA     NA     NA      NaN     NA   NA     0
## TOF: 150-225 531 676.75  822.5 822.5000 968.25 1114     2
## TOF: 225-500 313 384.50  437.0 495.1958 508.00 3362   143
## TOF: 500-800 323 535.00  627.0 746.4747 697.00 7109   158
## All TOFs     313 436.00  531.0 628.3861 653.00 7109   303

Running Custom Ranges

The user can also specify their own set of ranges.

#The TOF groups range (Default is 51-75, 75-150,150-225,225-500,500-800)
TOFRanges <- c(51,500,550,600,900,1000)

Running the function with the new parameter.

P <- SummaryPlot(FileDirectory,NameofStrain,Channel,Measure = MeasureType,Scale = ScaleType
                 ,WormIDs = AnnotatedIDs, TypeOfData = DataType,Ranges = TOFRanges)
P[[1]] 
P[[2]]
P[[3]]
P[[4]]
P[[5]]
P[[6]]
P[[7]]
##               Min 1st Qu Median      Mean  3rd Qu  Max Count
## TOF: 51-500   313  385.0    440  499.7103  509.00 3362   145
## TOF: 500-550  323  483.0    525  679.1522  616.00 7109    46
## TOF: 550-600  443  514.0    602  746.2143  655.75 6719    42
## TOF: 600-900  504  653.0    755  845.9143  966.00 2981   105
## TOF: 900-1000 643  945.0   1025 1101.7619 1237.00 1713    21
## All TOFs      313  454.5    580  688.0167  729.50 7109   359

Plotting Multiple Strains/Runs

The function SummaryPlots - which should not be confused by SummaryPlot - takes in multiple files and names to provide a boxplot of all the provided runs/strains.

Parameters

The parameters in this function are similar to the one above. The only difference is that FileDirectories and Names can take in more than one input. Note: If using more than one input the format for FileDirectories is -> c(‘File Directory 1’, ‘File Directory 2’, ‘File Directory 3’,…) and so on. The Names parameter also follows the same format. The Measure parameter dictates what data is plotted. The default is ‘I’ which plots the integral of each object. The Scale parameter allows the user to change the Y axis scale from the default ‘Normal’ to log scales by using the parameters ‘Log2’ and ‘Log10’.

FileDirectories <- c('https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202.txt',
                    'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/1123.2C%20red%20500%20green%20600%20gain%202.txt')
# In order of directories
Names <- c('Data #1',
           'Data #2')
#Which channel to plot, G is for green, Y is for yellow and R is for red 
FluorescenceChannel <- 'G'
#Measure: 'I' to plot the integral, 'H' to plot the maximum peak
MeasureType <- 'H'
#Scale: 'Normal' is the default. 'Log2' and 'Log10' change the scaling of the Y axis to logarithmic. 
ScaleType <- 'Normal'
#DataType: Default is 'Summary' which takes in the summary output file from the COPAS 
#'Fullfile' means the input data is the unsplit COPAS files (usually ends in _prf.txt)
DataType <- 'Summary'
#Maximum Fluorescence accepted. Any more than provided number will be filtered out. 
FluoMax <- 50000 #Input 'NA' to skip this step

Running The Function

P <- SummaryPlots(FileDirectories,Names,FluorescenceChannel, 
                  Measure = MeasureType,Scale = ScaleType, TypeOfData = DataType, FluoThreshold = FluoMax)
P[[1]]
P[[2]]
P[[3]]
P[[4]]
P[[5]]
P[[6]]

P[[7]]
##         Min 1st Qu Median      Mean 3rd Qu   Max Count
## Data #1  75  405.5  584.5  865.3348    888 25498  1114
## Data #2 163  642.0  885.0 1172.7493   1419 25364  1005

Running Classification on multiple files (Takes a long time)

Changing the Classify input to anything other than ‘NA’ will perform classification on each file before plotting.

#Classify: default is 'NA', assigning any other input will perform classification using the following model 
ClassifyInput <- 'Y'
ModelDirectory <- 'https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/Model1123.R'
# If the file type is "Summary", you must include the directory of the first channel in the same order as the summary files
FileDirectoryFirstChannel <- c("https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/N2%20red%20500%20green%20600%20gain%202_ch0_prf.txt",
                    "https://raw.githubusercontent.com/ksnksa/COPASProfiler/main/Data/1123.2C%20red%20500%20green%20600%20gain%202_ch0_prf.txt")

Note: if using the model provided by out GitHub, run the following code. Do not run if the model is in a local directory.

ModelDirectory <- url(ModelDirectory)

Running the function with classification.

P <- SummaryPlots(FileDirectories,Names,FluorescenceChannel, FirstChannelDirectories = FileDirectoryFirstChannel,
                  Measure = MeasureType,Scale = ScaleType,TypeOfData = DataType,Classify =ClassifyInput,
                 ModelDirectory = ModelDirectory)
P[[1]]
P[[2]]
P[[3]]
P[[4]]
P[[5]]
P[[6]]

P[[7]]
##         Min 1st Qu Median     Mean 3rd Qu   Max Count
## Data #1 313 436.00    531 628.3861 653.00  7109   303
## Data #2 304 530.75    694 843.0215 887.25 25364   326

COPASWormTools

Introduction

The COPASWormTools is a program with a graphical user interface that allows users to quickly view, annotate and run classification models on the outputs of the COPAS FC. The program features two tabs, “WormProfiler” and “SVM Clustering.” The program can be accessed through a web page here, or through the GitHub repository here by downloading the repository and running the code in Rstudio. Note: for a faster and smoother experience (especially with large data sets) I recommend running the program through the RStudio.

WormProfiler

The WormProfiler tab allows the user to upload the COPAS FC data and instantly view each object profile. To learn more about this tab, a brief tutorial was explained in the “How to get annotated IDs” section.

SVM Clustering

In the SVM clustering tab, the user provides COPAS FC data set as well as a classification model. The model is then used to perform classification, and finally the program plots the results.

1: You can pick the input type. Two file types are supported. The full file, which contains all the four channel data for each profile. The individual channel files (EXT, Green, Yellow, and Red), which are .txt files that contain the channel data for each profile. 2: Here you can browse for the COPAS FC file. 3: Here you will upload the classification model to be used. Note: Only .R and .r files are supported and only classification models generated by the ksvm function can be used. 4: If the option is checked then the program will measure fluorescence in a specific location in the profile. 5: This option will compare the highest fluorescence peak from both sides of the profile, and orient the profile by making the highest expression on the left. This is useful if the orientation of the worm is important for analysis. 6: Profiles with TOF higher than this number will be removed. 7: Profiles with EXT higher than this number will be removed. 8: You can specify which channel to classify on. For instance, the classification model we provided is based on the optical density (EXT) and not fluorescence. 9: After clicking this button the classification will begin. 10: Classification result will show here, the first plot will be the overall summary plot of all profiles in the classification result. After pressing button 9, you will have the option to pick what classification result to view (described further below). 11: Next and previous worm options will cycle through all profiles in the classification. Note: Worm ID#0 is the summary plot and not a real profile or ID. 12: This option can filter the profiles based on peak fluorescence. You can apply more than one filter at once. 13: After checking what fluorescence filter to apply, a slide bar will appear where you can decide the range of fluorescence you require. 14: After picking the fluorescence filtering parameters, press this button to filter the profiles. 15: This box will count how many profiles in a specific classification. If you applied fluorescence filtering, this box will update with the number of profiles that fall between the fluorescence range you dictated.

1: After pressing the submit data button (shown in the previous picture), the classification will start. Then you will be able to decide what classification result you’d like to view. 2: Pressing this button will plot the data (starting with the summary plot). An example can be seen in the previous picture (number 10). 3: You can view specific profiles here. 4: Pressing this button will download the IDs of the profiles in the current classification. 5: Pressing this button will download the profile data for all objects in the current classification. 6: In this section, you can control what will be plotted from each profile.

1: As mentioned previously, this option will tell the program to measure the fluorescence in a specific location. After checking the box, the slide bar will appear. The slide bar ranges from 1 to 49. If you pick 20 to 40, that means the program will calculate the peak fluorescence from 20% to 40% of the profile, then compare it to the peak fluorescence from 70% to 90% of the profile. The program will record the fluorescence with the highest peak. 2: Here you will be able to see what area the fluorescence is measured in. This bar updates automatically so feel free to try different parameters in the slide bar. 3: The fluorescence results will show after you apply a fluorescence filter.